Benchmarking Declarative Approximate Selection
نویسنده
چکیده
Benchmarking Declarative Approximate Selection Predicates Oktie Hassanzadeh Master of Science Graduate Department of Computer Science University of Toronto 2007 Declarative data quality has been an active research topic. The fundamental principle behind a declarative approach to data quality is the use of declarative statements to realize data quality primitives on top of any relational data source. A primary advantage of such an approach is the ease of use and integration with existing applications. Over the last couple of years several similarity predicates have been proposed for common quality primitives (approximate selections, joins, etc.) and have been fully expressed using declarative SQL statements. In this thesis, new similarity predicates are proposed along with their declarative realization, based on notions of probabilistic information retrieval. Then, full declarative specifications of previously proposed similarity predicates in the literature are presented, grouped into classes according to their primary characteristics. Finally, a thorough performance and accuracy study comparing a large number of similarity predicates for data cleaning operations is performed.
منابع مشابه
Towards the evaluation of the LarKC Reasoner Plug-ins
In this paper, we present an initial framework of evaluation and benchmarking of reasoners deployed within the LarKC platform, a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. We discuss the evaluation methods, measures, benchmarks, and performance targets for the plug-ins to be develo...
متن کاملThe Efficacy of Procedural and Declarative Learning Strategies on EFL Students’ Oral Proficiency
Style and strategies in EFL learning contexts and the effects of task types were explored to enhance language learning strategies. Using a quantitative pre-test, post-test design and interviews, this study investigated the effects of procedural and declarative learning strategies on EFL learners’ acquisition of English past tense performing narrative tasks. The participants were 396 male and fe...
متن کاملDataSynth: Generating Synthetic Data using Declarative Constraints
A variety of scenarios such as database system and application testing, data masking, and benchmarking require synthetic database instances, often having complex data characteristics. We present DataSynth, a flexible tool for generating synthetic databases. DataSynth uses a simple and powerful declarative abstraction based on cardinality constraints to specify data characteristics, and uses sop...
متن کاملDeclarative generation of synthetic XML data
Synthetic data can be extremely useful in testing and evaluating algorithms, tools and systems. Most synthetic data generators available today are the result of individual benchmarking efforts. Typically, these are complex programs in which the specifications of both the structure and the contents of the data are hard-coded. As a result, it is often difficult to customize these tools for produc...
متن کاملDecoys Selection in Benchmarking Datasets: Overview and Perspectives
Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. ...
متن کامل